Investigating Repetition and Reusability of Translations in Subtitle Corpora for use with Example-Based Machine Translation
نویسندگان
چکیده
Repetition and reusability are two notions that lie at the heart of a number of current approaches to computer-aided translation (CAT) and machine translation (MT), but are rarely problematized in the literature. In this paper, we discuss these notions in the context of the example-based machine translation (EBMT) of movie subtitles in a project currently underway at Dublin City University. We start by describing, in sections 2 and 3 below, how repetition and reusability have been dealt with by other researchers in CAT and MT, before going on to outline our own approach in section 4. We characterize our approach as combining both ‘prospective’ and ‘retrospective’ phases. The prospective phase relies on measurements of levels of repetition in our texts, on the one hand, and human evaluations of the reusability of existing translations, on the other, while the retrospective phase relies on real-user evaluations of automatically translated subtitles after they have been inserted into relevant movie clips. While the focus of this paper is on the prospective evaluation of repetition and reusability in our corpora, we will also make some comments on why prospective measures and judgements might be expected to correlate with retrospective evaluations. We will provide some quantitative analysis results from the corpora used with the EBMT system, as well as an overview of the responses generated in the human evaluation. The paper concludes with comments on the usefulness of a prospective phase, and the benefits this has for the next phase of research.
منابع مشابه
Investigating the Social Practice of Persian Translations of ‘The Girl You Left Behind’ through Translators’ Lexical and Grammatical Strategies
The present study aimed to shed light upon the differences of social practice of Persian translations of The Girl You Left Behind written by Jojo Moyes (2012) with original text in English based on Fairclough's (1995) model. In this regard, through a careful analysis of the source and target texts, English social prac- tice instances were selected along with their Persian equivalents as the cor...
متن کاملPhrase-Based Machine Translation based on Simulated Annealing
In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source phrases. Then we use inter-lingual triggers in order to retrieve their translations. Furthermore, we consider the way of extracting phrase translations as an optimization issue. For that we use simulated annealing algorithm to f...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Comparative Study in Relation to the Translation of the Linguistic Humor
Mark Twain made use of repetition and parallelism as two comedic literary devices to bring comic effect to the readers. Linguistic devices of humor, repetition and parallelism seemed to create many difficulties in the translation of literary texts. The present study applied Delabatista‟s strategies for translating wordplays such as repetition and parallelism in the translation of humorous texts...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007